The influence of training actives/inactives ratio on machine learning performance

نویسندگان

Rafal Kurczab

Sabina Smusz

Andrzej J. Bojarski

چکیده

In drug discovery, machine learning is widely used to classify molecules as active or inactive against a particular target. The vast majority of these methods (supervised learning) needs a training set of objects (molecules) to develop a decision rule that can be used to classify new entities (the test set) into one of the two mentioned classes [1]. A lot of studies, searching an optimal learning parameters and their impact on classification effectiveness were performed [2,3]. Unfortunately, there is no data showing the influence of actives/inactives ratio, used to model training, on the efficiency of new active compounds identification. Therefore, the main goal of this study was to examine the impact of changing the number of inactives in the training set with fixed amount of actives. For a given ratio, the inactives were randomly selected from ZINC database (10-times to prevent an overestimations error). This concept was verified on three different protein targets (i.e. 5-HT1A, HIV-1 protease and matrix metalloproteinase) and a set of algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) implemented in WEKA package [4]. To compounds representation, two types of molecular fingerprints were used (MACCS and hashed fingerprint), to determine their possible impact on machine learning performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The influence of the inactives subset generation on the performance of machine learning methods

BACKGROUND A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. RESULTS In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and dive...

متن کامل

Lead Hopping Using SVM and 3D Pharmacophore Fingerprints

The combination of 3D pharmacophore fingerprints and the support vector machine classification algorithm has been used to generate robust models that are able to classify compounds as active or inactive in a number of G-protein-coupled receptor assays. The models have been tested against progressively more challenging validation sets where steps are taken to ensure that compounds in the validat...

متن کامل

A Machine Learning Approach to Enhance Scoring Performance in Docking-Based Virtual Screening Experiments: COX-1 as a Case Study

Molecular docking can be reasonably successful at reproducing X-ray poses of a ligand in the binding site of a protein. However, scoring functions are typically unsuccessful at correctly ranking ligands according to their binding affinity. Using cyclooxygenase-1 (COX-1), a particularly challenging workhorse in virtual screening (VS) we show how the use of support vector machines (SVMs), trained...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 5 شماره

صفحات -

تاریخ انتشار 2013

The influence of training actives/inactives ratio on machine learning performance

نویسندگان

چکیده

منابع مشابه

The influence of the inactives subset generation on the performance of machine learning methods

Lead Hopping Using SVM and 3D Pharmacophore Fingerprints

A Machine Learning Approach to Enhance Scoring Performance in Docking-Based Virtual Screening Experiments: COX-1 as a Case Study

A Hybrid Optimization Algorithm for Learning Deep Models

A Hybrid Optimization Algorithm for Learning Deep Models

عنوان ژورنال:

اشتراک گذاری